<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
	"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Input Files for Gmaj</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta http-equiv="Content-Style-Type" content="text/css">
<link rel="stylesheet" type="text/css" href="gmaj.css">
</head>
<body>
<p class=vvlarge>
<h2>Input Files for Gmaj</h2>
<p class=vvlarge>
TABLE OF CONTENTS
<p class=small>
<ul class=notop>
<li><a href="#intro">Introduction</a>
<li><a href="#param">Parameters File</a>
<li><a href="#zip">Compression and Bundling</a>
<li><a href="#coord">Coordinate Systems</a>
<li><a href="#align">Alignments</a>
<li><a href="#exon">Exons</a>
<li><a href="#repeat">Repeats</a>
<li><a href="#link">Linkbars</a>
<li><a href="#under">Underlays</a>
<li><a href="#high">Highlights</a>
<li><a href="#color">Color List</a>
<li><a href="#generic">Generic Annotation Formats</a>
</ul>
<p class=vlarge>

<p class=hdr>
<h3><a name="intro">Introduction</a></h3>
<p>
This page describes the input files supported by Gmaj, and their
formats.  Only the <a href="#align">alignment file</a> is
required; the others are optional.  Except where noted, all
information applies to both the stand-alone and applet modes of
Gmaj.
<p>
For annotations, Gmaj supports two broad categories of file
formats.  The original set of formats is essentially the same as
those used by <a href="http://pipmaker.bx.psu.edu/pipmaker/"
>PipMaker</a> and <a href="http://globin.bx.psu.edu/dist/laj/"
>Laj</a>, where each destination for the data (exons panel, color
underlays, etc.) has its own file format tailored for the needs of
that display.  These files can be cumbersome to prepare manually,
though PipMaker's associated utilities, such as
<a href="http://pipmaker.bx.psu.edu/piphelper/">PipHelper</a> and
the <a href="http://pipmaker.bx.psu.edu/pipmaker/tools.html"
>PipTools</a>, can significantly reduce the burden.
<p>
However, since sequence annotations are increasingly becoming
available in standardized formats from on-line resources such as
the <a href="http://genome.ucsc.edu/cgi-bin/hgTables">UCSC Table
Browser</a>, Gmaj can now accept some of these formats as well.
These are referred to here as "generic" formats because they are
not restricted to a particular biological data type or Gmaj
display panel.
<p>
The PipMaker-style formats are described below in the sections for
each panel, while the generic ones are discussed in a separate
section, <a href="#generic">Generic Annotation Formats</a>.
<p class=large>
<center>
<table width=55%>
<tr>
<td valign=top align=right><img class=lower src="hand14.gif">
<!-- Pointing hand icon is from Clip Art Warehouse,
	at http://www.clipart.co.uk/ -->
</td>
<td valign=top>
<ul class="notop nobottom lessindent">
<li>	<b>All files must consist solely of plain text ASCII
	characters.</b>&nbsp; (For example, no Word documents.)
	<p class=small>
<li>	<b>All <a href="#coord">coordinates</a> for PipMaker-style
	annotations are 1-based, closed interval.</b>&nbsp; Those
	for generic annotations may be either 1-based or 0-based
	and closed or half-open, depending on the format.
</ul>
</td>
</tr>
</table>
</center>
<p>

<p class=hdr>
<h3><a name="param">Parameters File</a></h3>
<p>
The annotation files are optional, but because in some alignments
any of the sequences can be viewed as the reference sequence,
there are potentially a large number of annotation files to
provide, too many to type their names on the command line or
paste them into a dialog box every time you want to view the data.
For this reason, Gmaj uses a meta-level <b>parameters file</b>
that lists the names of all the data files, plus a few other
data-related options.  Then when running Gmaj, you only have to
specify that one file name.  However, if you don't want to use
any of these annotations or options, you can specify a single
<a href="#align">alignment file</a> directly in place of a
parameters file.
<p>
A sample parameters file that you can use as a template is
provided at <code><a href="sample.gmaj">sample.gmaj</a></code>.
It contains detailed comments at the bottom explaining the syntax
and meaning of the parameters.
<p>

<p class=hdr>
<h3><a name="zip">Compression and Bundling</a></h3>
<p>
Gmaj supports a "bundle" option, which allows you to collect and
compress some or all of the data files into a single file in
<code>.zip</code> or <code>.jar</code> format (not
<code>.tar</code>, sorry).  This is especially useful for
streamlining the applet's data download, but is also supported in
stand-alone mode.  A few tips:
<ul>
<li>	If the <a href="#param">parameters file</a> is included in
	the bundle it must be the first file in it, since Gmaj reads
	the bundle sequentially and needs the parameters file to
	process the others.  In this case, there is no need to
	mention the parameters file on the command line or in the
	applet tags; just specify the bundle.  But if the parameters
	file is not in the bundle, specify both.
	<p class=small>
<li>	Data files in the bundle should be referred to within the
	parameters file using their plain filenames, without paths,
	and these must be unique.  Any data files outside the bundle
	should be referred to normally, using the rules described in
	<code><a href="sample.gmaj">sample.gmaj</a></code>.
	<p class=small>
<li>	Do not use filenames containing <code>/</code>,
	<code>\</code>, or <code>:</code> in the bundle.  Gmaj
	needs to remove the path that may have been added to each
	name by the zip or jar program, and since it doesn't know
	what platform that program was run on, it treats all of
	these characters as path separators.
	<p class=small>
<li>	If you are not using a parameters file (i.e., you want to
	specify the <a href="#align">alignment file</a> directly,
	without any annotations or other data-related options),
	then the alignment file must be listed in place of the
	parameters file, not as a bundle (there's nothing else
	to bundle with it anyway).
</ul>
<p>
As an alternative to bundling, data files can be compressed
individually in <code>.zip</code>, <code>.jar</code>, or
<code>.gz</code> format; this gains the compact size for storage
and transfer, but still requires overhead for multiple HTTP
connections in applet mode.  The file name must end with the
corresponding extension for the compression format to be
recognized.  (Such files can also be included in the bundle
if desired; though little if any additional compression is
typically achieved, this may be more convenient than unzipping
a large file just to bundle it.)
<p>

<p class=hdr>
<h3><a name="coord">Coordinate Systems</a></h3>
<p>
If you supply any annotations for Gmaj to display, these files
must all use position coordinates that refer to the same original
sequences identified in the MAF <a href="#align">alignment files</a>
(ignoring any display offsets specified in the <a href="#param"
>parameters file</a>).  However, even though the MAF coordinates
are 0-based, the PipMaker-style annotation files all use a
1-based, closed-interval coordinate system (i.e., the first
nucleotide in the sequence is called "1", and specified ranges
include both endpoints).  This is for consistency with PipMaker,
so the same files can be used with both programs, and the same
tools can be used to prepare them.  Coordinates for generic
annotations may be either 1-based or 0-based and closed or
half-open, depending on the format, but Gmaj always adjusts
them as needed (including the ones in the MAF files) to convert
everything to a 1-based, closed-interval system for display.
<p>

<p class=hdr>
<h3><a name="align">Alignments</a></h3>
<p>
Gmaj is designed to display multiple-sequence alignments in
<a href="http://genome.ucsc.edu/FAQ/FAQformat">MAF</a> format.
It is especially suited for sequence-symmetric alignments from
programs such as <a href="http://www.bx.psu.edu/miller_lab/"
>TBA</a>, but can also display MAF files that have a fixed
reference sequence.  (In the latter case it is a good idea to
set the <code>refseq</code> field in your <a href="#param"
>parameters file</a>, to prevent displaying the alignments with
an inappropriate reference sequence.)  It is possible to display
several alignment files simultaneously on the same plots, e.g.
for comparing output from different alignment programs.
<p>
Gmaj normally requires that each sequence name appears at most
once in each MAF block, i.e., that the values of the "src" field
are unique across all of the <code>s</code> lines within the
same block.  However, there is a special exception for the case
of pairwise self-alignments: if all of the blocks have just two
rows, then all of the sequence names can be the same.  In this
case Gmaj distinguishes the rows in each block by internally
adding a <code>~</code> suffix to the second row's sequence name;
the <code>~</code> does not show in the main display, but you may
occasionally see it in an error message.
<p>
The downside of this feature is that <b>sequence names in the MAF
files must not end with <code>~</code></b>, even for non-self
alignments.
<p>

<p class=hdr>
<h3><a name="exon">Exons</a></h3>
<p>
Each of these files lists the locations of genes, exons, and
coding regions in a particular reference sequence.  The exons
and UTRs are displayed as black and gray boxes in a separate
panel above the alignment plots.
<p>
In the PipMaker-style exons format, the directionality of a gene
(<code>&gt;</code>, <code>&lt;</code>, or <code>|</code>), its
start and end positions, and name should be on one line, followed
by an optional line beginning with a <code>+</code> character that
indicates the first and last nucleotides of the translated region
(including the initiation codon, <i>Met</i>, and the stop codon).
These are followed by lines specifying the start and end positions
of each exon, which must be listed in order of increasing address
even if the gene is on the reverse strand (<code>&lt;</code>).  By
default Gmaj will supply exon numbers, but you can override this
by specifying your own name or number for individual exons.  Blank
lines are ignored, and you can put an optional title line at the
top.  Thus, the file might begin as follows:
<pre>
     My favorite genomic region

     < 100 800 XYZZY
     + 150 750
     100 200
     600 800

     > 1000 2000 Frobozz gene
     1000 1200 exon 1
     1400 1500 alt. spliced exon
     1800 2000 exon 2

     ... etc.
</pre>
<p>

<p class=hdr>
<h3><a name="repeat">Repeats</a></h3>
<p>
Each of these files lists interspersed repeats (and possibly other
features such as CpG islands) in a particular reference sequence.
These are displayed in a separate panel just below the exons,
using the same shapes and shading as PipMaker if possible.
<p>
In the PipMaker-style repeats format, the first line identifies
this as a simplified repeats file (as opposed to
<a href="http://www.repeatmasker.org/">RepeatMasker</a> output,
which Gmaj does not yet support).  Each subsequent line specifies
the start, end, direction, and type of an individual feature.
<pre>
     %:repeats

     1081 1364 Right Alu
     1365 1405 Simple
     ... etc.
</pre>
The allowed PipMaker types are:
<code>Alu</code>, <code>B1</code>, <code>B2</code>,
<code>SINE</code>, <code>LINE1</code>, <code>LINE2</code>,
<code>MIR</code>, <code>LTR</code>, <code>DNA</code>,
<code>RNA</code>, <code>Simple</code>, <code>CpG60</code>,
<code>CpG75</code>, and <code>Other</code>.  Of these, all except
<code>Simple</code>, <code>CpG60</code>, and <code>CpG75</code>
require a direction (<code>Right</code> or <code>Left</code>).
<p>

<p class=hdr>
<h3><a name="link">Linkbars</a></h3>
<p>
Each of these files contains reference annotations, i.e.,
noteworthy regions in a particular reference sequence, which are
drawn in a separate panel as colored bars.  Typically each bar
has an associated URL pointing to a web site with more information
about the region, but this is not required.  In applet mode Gmaj
opens a new browser window to visit the linked site when the user
clicks on a bar; in stand-alone mode Gmaj is not running within
a web browser, so it just displays the URL for the user to visit
manually via copy-and-paste.
<p>
The PipMaker-style format first defines various types of links
and associates a color with each of them, then specifies the type,
position, description, and URL for each annotated region.
<pre>
     # linkbars for part of the mouse MHC class II region

     %define type
     %name PubMed
     %color Blue

     %define type
     %name LocusLink
     %color Orange

     %define annotation
     %type PubMed
     %range 1 2000
     %label Yang et al. 1997.  Daxx, a novel Fas-binding protein...
     %summary Yang, X., Khosravi-Far, R. Chang, H., and Baltimore, D. (1997).
       Daxx, a novel Fas-binding protein that activates JNK and apoptosis.
       Cell 89(7):1067-76.
     %url http://www.ncbi.nlm.nih.gov:80/entrez/
     query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9215629&dopt=Abstract

     ... etc.
</pre>
Here, for example, the first stanza requests that each feature
subsequently identified as a PubMed entry be colored blue.
The name must be a single word, perhaps containing underline
characters (e.g., <code>Entry_in_GenBank</code>), and the color
must come from Gmaj's <a href="#color">color list</a>.
<p>
The third stanza associates a PubMed link with positions
1-2000 in this sequence.  The label should be kept fairly
short, as it will be displayed on Gmaj's position indicator line
when the user points at this linkbar.  The summary is optional;
it is used only by PipMaker and will be ignored by Gmaj.  Also,
while PipMaker allows several summary/URL pairs within a single
annotation, Gmaj expects each field to occur at most once.  If
Gmaj encounters extra URLs, it will just use the first one and
display a warning message.
<p>
Note that summaries and URLs (but not labels) can be broken into
several lines for convenience; the line breaks are removed when
the file is read, but they are not replaced with spaces.  Thus
a continuation line for a summary typically begins with a space
to separate it from the last word of the previous line, while
a URL continuation does not.
<p>
Also note that stanzas should be separated by blank lines, and
lines beginning with a <code>#</code> character are comments
that will be ignored.  The linkbars can appear in the file in
any order, and several can overlap at the same position with no
problem, since Gmaj will display them in multiple rows if
necessary.  In PipMaker this format is called "annotations with
hyperlinks".
<p>

<p class=hdr>
<h3><a name="under">Underlays</a></h3>
<p>
Each of these files specifies underlays (colored bands) to be
painted on a particular pairwise pip and its corresponding
dotplot.  The bands are specified as regions in the reference
sequence and are normally drawn vertically; however for a dotplot,
Gmaj will also look to see if you have specified an underlay file
for the transposed situation where the reference and secondary
sequences are swapped, and if so, will draw those underlays as
horizontal bands in the secondary sequence.
<p>
The PipMaker-style underlay format supported by Gmaj looks like
this:
<pre>
     # partial underlays for the BTK region

     LightYellow Gene
     Green Exon
     Red Strongly_conserved

     35324 72009 (BTK gene) Gene
     49781 49849 (exon 4) Exon
     51403 51484 Exon
     50350 50513 (conserved 84%) Strongly_conserved 84
     52376 52603 (Kilroy was here) Strongly_conserved 92 +
     ... etc.
</pre>
The first group of lines describes the intended meaning of the
colors, while the second group specifies the location of each band.
Colors must come from Gmaj's <a href="#color">color list</a>, but
the meaning of each color can be any single word chosen by you.
The text in parentheses is an optional label which will be
displayed on Gmaj's position indicator line when the user points
the mouse at that band.  The parentheses must be present if the
label is, and the label itself cannot contain any additional
parentheses.  The number following the color category is an
optional integer score that can be used to interactively adjust
which underlays are displayed; see "Underlays Box" in the
Menus and Widgets section of <a href="gmaj_help.html"
>Starting and Running Gmaj</a> for more information.  (The
label and score are extra features not supported by PipMaker.)
A <code>+</code> or <code>-</code> character at the end of a
location line will paint just the upper or lower half of the band
on the pip (but is ignored for dotplots).  This allows you to
differentiate between the two strands, or to plot potentially
overlapping features like gene predictions and database matches.
<p>
Note that if two bands overlap, the one that was specified last
in the file appears "on top" and obscures the earlier one (except
for the special <code><a href="#hatch">Hatch</a></code> color).
Thus in this example, the green exons and red strongly conserved
regions cover up parts of the long yellow band representing the
gene.  As in the links file, lines beginning with a <code>#</code>
character are comments that will be ignored.
<p>

<p class=hdr>
<h3><a name="high">Highlights</a></h3>
<p>
Highlight files are analogous to the <a href="#under">underlay</a>
files, but each of these specifies colored regions for a
particular sequence in the text view, rather than for a plot.
If you do not specify a highlight file for a particular sequence,
Gmaj will automatically provide default highlights based on the
<a href="#exon">exons</a> file (if you provided one).  These will
use one color for whole genes, overlaid with different colors to
indicate exons on the forward vs. reverse strand.  If the exons
file specifies a gene's translated region, then the 5&acute; and
3&acute; UTRs will be shaded using lighter colors.  These default
highlights make it easy to examine the putative start/stop codons
and splice junctions, as well as providing a visual connection
between the graphical and text views.  But if for some reason you
do not want any text highlights, you can suppress them by
specifying an empty highlight file.
<p>
The PipMaker-style format for highlights is the same as for
underlays, except that any <code>+</code> or <code>-</code>
indicators will be ignored, and the <code>Hatch</code> color is
not supported for highlights.  Just as with underlays, labels
can be included which will be shown when the user points at
the highlight, scores can be used to limit which entries are
displayed, and highlights that are listed later in the file will
cover up those that appear earlier.
<p>

<p class=hdr>
<h3><a name="color">Color List</a></h3>
<p>
For Gmaj's PipMaker-style annotations, the available colors are:
<pre>
    Black   White        Clear
    Gray    LightGray    DarkGray
    Red     LightRed     DarkRed
    Green   LightGreen   DarkGreen
    Blue    LightBlue    DarkBlue
    Yellow  LightYellow  DarkYellow
    Pink    LightPink    DarkPink
    Cyan    LightCyan    DarkCyan
    Purple  LightPurple  DarkPurple
    Orange  LightOrange  DarkOrange
    Brown   LightBrown   DarkBrown
</pre>
These names are case-sensitive (i.e., capitalization matters).
Not all of these are supported by PipMaker.  Also, be aware that
the appearance of the colors may vary between PipMaker and Gmaj,
and from one printer or monitor to the next.
<p class=subhdr>
<a name="hatch"><b><code>Hatch</code></b></a>
<p>
In addition to the regular colors listed above, Gmaj supports a
special "color" for underlays called <code>Hatch</code>, which
is drawn as a pattern of diagonal gray lines.  Normally if two
underlays overlap, the one that was specified last in the file
appears "on top" and obscures the earlier one.  However,
<code>Hatch</code> underlays have the special property that they
are always drawn after the other colors, and since the space
between the diagonal lines is transparent, they allow the other
colors to show through.  Currently <code>Hatch</code> is only
supported for underlays, not for highlights or linkbars.
<p>

<p class=hdr>
<h3><a name="generic">Generic Annotation Formats</a></h3>
<p>
The standardized generic formats currently supported by Gmaj
include
<a href="http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml"
>GFF</a> (v1 & v2),
<a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#GTF"
>GTF</a>, and various flavors of
<a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BED"
>BED</a> (including the full BED12 format, a.k.a. "gene BED").
For details on these formats, please see the specifications at
the above links; this document will mainly discuss their use
by Gmaj.
<p>
These formats are all <b>tab-separated</b>, and despite their
differences are similar enough that Gmaj can extract comparable
fields and treat them more or less the same.  Note that Gmaj is
not intended as a format validator: parsing is more lenient in
some respects than the official format specifications, and Gmaj
will ignore fields it has no use for.  Also, interpretation of
these open-ended formats depends partly on what type of annotation
is expected; e.g. if Gmaj is trying to read exons from a GFF v1
file, it will assume that the group field is the gene name.  It
will generally show warning messages to keep the user apprised
of any such assumptions it is making (if these become too annoying
they can be individually suppressed in the <a href="#param"
>parameters file</a>; see <code><a href="sample.gmaj"
>sample.gmaj</a></code> for details).  Because one of the main
reasons for supporting these formats is to enable the use of
annotation files obtained from public sources, Gmaj tries not to
balk at anomalies that are probably not the user's fault, and
when practical will simply skip questionable items with a warning
message.  Each type of message will generally be displayed only
once, and not repeated for every item with the same problem.
<p>
<p class=subhdr>
<a name="fileext"><b>Filename Extensions</b></a>
<p>
In order to distinguish generic files from PipMaker-style ones
and handle them appropriately, Gmaj requires that files in
generic formats have names ending with any of certain extensions.
The default list is <code>.gff</code>, <code>.gtf</code>,
<code>.bed</code>, <code>.ct</code>, and <code>.trk</code>, but
this can be customized (see <code><a href="sample.gmaj"
>sample.gmaj</a></code>).
<p>
<p class=subhdr>
<a name="quote"><b>Quoting</b></a>
<p>
Some of the generic formats require text values to be enclosed
in double quotes (<code>" "</code>).  Even when not strictly
required it is usually a good idea to do so, especially if the
value contains spaces.  The official specifications generally
don't say what to do if a value contains embedded quote
characters, but Gmaj supports a rudimentary mechanism for
escaping them with a backslash (<code>\</code>).  However it
does not provide for escaping the backslash: quoted values
should not end with <code>\</code> (insert a space before the
final quote if necessary).
<p>
<p class=subhdr>
<a name="empty"><b>Empty Fields</b></a>
<p>
When reading the generic formats, Gmaj treats two adjacent tab
characters as an empty field.  However, your files will be easier
for humans to read if you do not leave fields completely empty.
Gmaj recognizes a value of <code>.</code> (the dot character)
to mean "unspecified" for fields such as strand, score, feature,
and color, in some cases even when the official formats don't.
For instance, GFF v2 explicitly calls for using <code>.</code>
when there is no score, but Gmaj allows you to do this with the
other generic formats as well, in order to distinguish between
"no score" and a score that is truly zero.  For colors, in
addition to <code>.</code> Gmaj also interprets <code>0</code>
to mean "unspecified", in keeping with examples at UCSC.
<p>
<p class=subhdr>
<a name="gencoord"><b>Coordinates</b></a>
<p>
The GFF and GTF formats use 1-based, closed-interval coordinates
(i.e., sequence numbering starts with "1", and specified ranges
include both endpoints), while BED uses a 0-based, half-open
system (the first nucleotide of the sequence is numbered "0",
and the ending position is not included in the region).  For all
of these formats, positions are given relative to the beginning
of the named sequence regardless of which strand the feature is
on (unlike MAF), and <code>start</code> must be less than or
equal to <code>end</code>.
<p>
<p class=subhdr>
<a name="gffconv"><b>GFF Conventions</b></a>
<p>
BED format is relatively fixed in how its fields are used, but
GFF and GTF are more variable and require additional conventions
for most effective use with Gmaj.  In particular, the values of
the "feature" field and the optional "attributes" affect how Gmaj
will interpret and display an item.
<p>
Values of the feature field that are recognized for special
treatment include:
<p class=tiny>
<ul class="notop nobottom">
<li>	<code>gene</code> or values starting with <code>gene_</code>
<li>	<code>exon</code> or values starting with <code>exon_</code>
<li>	<code>start_codon</code>, <code>str_codon</code>,
	<code>stop_codon</code>, <code>stp_codon</code>, or
	<code>cds</code>
<li>	<code>repeatmasker</code> or any of the 
	<a href="#repeat">PipMaker repeat or CpG types</a>
</ul>
<p class=tiny>
Of these, only the PipMaker types are case-sensitive.
<p>
For GFF v2 and GTF, the currently recognized attribute tags are:
<p class=tiny>
<ul class="notop nobottom">
<li>	<code>gene</code> or <code>gene_id</code>: the name of the
	gene, e.g. for grouping exons (<code>transcript_id</code> is
	ignored)
<li>	<code>name</code>: an optional name for this individual item,
	e.g. for an exon label
<li>	<code>sequence</code> (when feature is
	<code>repeatmasker</code>): the name/class/family of the
	repeat, e.g. <code>AluJb/SINE/Alu</code>
<li>	<code>color</code>: a <a href="#gencolor">color</a>
	specification in UCSC format, e.g. <code>0,0,255</code>
<li>	<code>url</code> or <code>ucsc_id</code>: the URL for
	linkbars; <code>$$</code> will be replaced with the value of
	<code>name</code>
</ul>
<p class=tiny>
These keywords are not case-sensitive, but they cannot have
multiple values.
<p>
<p class=subhdr>
<a name="custom"><b>Custom Tracks</b></a>
<p>
Along with the basic formats listed above, Gmaj also supports UCSC
<a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks"
>custom track</a> headers.
<a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#TRACK"
>Track lines</a> can specify certain settings for an entire
track; currently <code><a href="#gencolor">color</a></code>,
<code><a href="#gencolor">itemRgb</a></code>, <code>offset</code>,
and <code>url</code> are supported.  They also allow several
tracks (even in mixed formats) to be combined in a single file.
Gmaj does not currently provide a way to use just one particular
track from such a file (it will be treated as one big bag of
annotations), but lines in unsupported formats such as
<a href="http://genome.ucsc.edu/goldenPath/help/wiggle.html"
>WIG</a> are gracefully skipped.
<a href="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#lines"
>Browser lines</a> are also skipped; Gmaj's initial zoom position
is controlled by command-line or applet parameters rather than by
individual annotation files.
<p>
<p class=subhdr>
<a name="multiseq"><b>Multiple Sequences</b></a>
<p>
Generic files can also contain annotations for several sequences,
because unlike the PipMaker-style formats, they all have a
"seqname" or "chrom" field that Gmaj can use to select the
appropriate lines.  Ideally Gmaj expects this field to match
the sequence name from the <a href="#align">alignment files</a>,
but has two ways to deal with exceptions.  If there is only one
seqname in the annotation file, then Gmaj will go ahead and use
it, but will display a warning (unless the mismatch can be fixed
by prepending the organism name, or the organism name plus
<code>chr</code>, to the annotation seqname).  But if the file
has annotations for several sequences and some don't match the
alignment files, you need to tell Gmaj which is which by adding
an alias in the <a href="#param">parameters file</a> (see
<code><a href="sample.gmaj">sample.gmaj</a></code>).
<p>
<p class=subhdr>
<a name="reuse"><b>Reusing Files</b></a>
<p>
One of the advantages of using generic formats is that files can
be reused in multiple panels without reformatting, e.g. as both
exons and underlays.  Normally linkbars, underlays, and text
highlights are simply handled as arbitrary regions of a specified
color, since they could represent any type of biological feature.
However, you can ask Gmaj to interpret them as exons or repeats
by adding a type hint in the <a href="#param">parameters file</a>
(see <code><a href="sample.gmaj">sample.gmaj</a></code>).  Note
that currently this will also cause any <a href="#gencolor"
>specified colors</a> in that file to be overridden with Gmaj's
defaults.
<p>
Combining several biological types of annotations (e.g. exons
and repeats) in one file is possible, but not recommended.  Gmaj
will try to skip lines that are not appropriate for the type it
is seeking, but it may draw more than you want.
<p>
<p class=subhdr>
<a name="cds"><b>Coding Sequence</b></a>
<p>
Currently Gmaj has no special support for multiple transcripts.
When inferring UTRs, all of the CDS-related items for a single
gene name are combined, and the interval from the lowest
coordinate to the highest is used as the CDS.  Also, some of the
formats' rules specify whether or not the initiation and stop
codons should be included in the CDS, but Gmaj does not make
adjustments to compensate for that; instead it simply includes
all of the given endpoints in the CDS.
<!-- and leaves it up to the user to interpret the display based
on the convention used in the files he/she provided.  [the user
does not supply files for applets] -->
<p>
<p class=subhdr>
<a name="gencolor"><b>Colors</b></a>
<p>
Colors can be specified for individual annotation lines via the
<code>itemRgb</code> field (for BED) or a <code>color</code>
attribute (for GFF v2 or GTF).  However, for <a href="#custom"
>custom tracks</a>, these are governed by the track line's
<code>itemRgb</code> attribute, which defaults to off per the
UCSC specification.  Thus if you have track lines and want to
use the per-item colors, you need to include
<code>itemRgb=On</code> in the track attributes.
<p>
Track lines can also have a <code>color</code> attribute for
the entire track, which will be used if <code>itemRgb</code> is
off, or if an individual item does not have its own color.
However in a rare break from the UCSC specification, Gmaj does
not use black as the default if the track color is unspecified
(black underlays and highlights just don't work with black plots
and text).  Instead it uses its own default colors, which for
genes/exons are the same as the colors for <a href="#high"
>default highlights</a>, or light gray for other annotations.
Note that these defaults will also override your colors when
<a href="#reuse">type hints</a> are used.
<p>
All of the above-mentioned color values are specified in UCSC
format, which consists of three comma-separated RGB values from
0-255 (e.g. <code>0,0,255</code>).
<p>
<p class=subhdr>
<a name="sort"><b>Sorting</b></a>
<p>
The order of the lines is not supposed to matter in these generic
formats, but for most of the Gmaj panels it does matter:  exons
need to be grouped by gene and ordered by position so UTRs can be
inferred and exon numbers assigned, early underlays are covered
up by later ones, etc.  Gmaj solves this problem by sorting the
data before it is displayed.  Exons are sorted first by gene name
in ascending order, and then within each gene by start position
(ascending) and lastly in case of a tie, by end position
(descending).  All other annotation types are sorted first by
length in descending order, and then in case of a tie by start
position (ascending).  This usually produces a reasonable display,
but if you need direct control of the order, you can use the
PipMaker-style formats instead.
<p>

<p class=vvlarge>
<hr>
<i>Cathy Riemer, June 2008</i>

<p class=scrollspace>
</body>
</html>
